AITopics | multi-object scene

Collaborating Authors

multi-object scene

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GAMA: GenerativeAdversarialMulti-Object SceneAttacks

Neural Information Processing SystemsFeb-12-2026, 18:27:12 GMT

On the other hand, natural scenes include multiple dominant objects that are semantically related.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Riverside County > Riverside (0.04)
Asia (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.69)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Neural Information Processing SystemsDec-23-2025, 23:17:32 GMT

Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for \textit{unsupervised object-centric scene representation} are incapable of aggregating information from multiple observations of a scene. As a result, these ``single-view'' methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose \textit{The Multi-View and Multi-Object Network (MulMON)}---a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views.

learning object-centric representation, multi-object scene, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.39)

Add feedback

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Neural Information Processing SystemsOct-2-2025, 17:51:45 GMT

As a result, these "single-view" methods form their representations of a 3D scene based only on a single 2D observation (view).

artificial intelligence, machine learning, representation, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

KGN-Pro: Keypoint-Based Grasp Prediction through Probabilistic 2D-3D Correspondence Learning

Chen, Bingran, Li, Baorun, Yang, Jian, Liu, Yong, Zhai, Guangyao

arXiv.org Artificial IntelligenceAug-1-2025

High-level robotic manipulation tasks demand flexible 6-DoF grasp estimation to serve as a basic function. Previous approaches either directly generate grasps from point-cloud data, suffering from challenges with small objects and sensor noise, or infer 3D information from RGB images, which introduces expensive annotation requirements and discretization issues. Recent methods mitigate some challenges by retaining a 2D representation to estimate grasp keypoints and applying Perspective-n-Point (PnP) algorithms to compute 6-DoF poses. However, these methods are limited by their non-differentiable nature and reliance solely on 2D supervision, which hinders the full exploitation of rich 3D information. In this work, we present KGN-Pro, a novel grasping network that preserves the efficiency and fine-grained object grasping of previous KGNs while integrating direct 3D optimization through probabilistic PnP layers. KGN-Pro encodes paired RGB-D images to generate Keypoint Map, and further outputs a 2D confidence map to weight keypoint contributions during re-projection error minimization. By modeling the weighted sum of squared re-projection errors probabilistically, the network effectively transmits 3D supervision to its 2D keypoint predictions, enabling end-to-end learning. Experiments on both simulated and real-world platforms demonstrate that KGN-Pro outperforms existing methods in terms of grasp cover rate and success rate.

artificial intelligence, international conference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.1482

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Review for NeurIPS paper: Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Neural Information Processing SystemsJan-23-2025, 16:07:46 GMT

They also get inspiration from prior work on iterative inference for VAEs and propose an inference mechanism to allow the model efficiently learn an object-centric scene representation from multiple views of a scene which contains multiple objects. During training, the model learns to infer [1,...K] objects (d-dimensional Gaussian latents) in the scene where K upper-bounds the number of objects the model can recognize, and K is set to a high enough value. During training 5 views of a scene are presented and the model is expected to reconstruct both the final rendering and object segmentations for a randomly queried novel viewpoint. They evaluate their their model on GQN-Jaco and two variant so the CLEVR datasets. They compare their model to IODINE and GQN for object segmentation, novel queried viewpoint prediction and disentanglement analysis; the results show that their method performs better quantitatively and qualitatively. They also demonstrate that their model has learned good feature-level disentangled representations.

learning object-centric representation, representation, scene representation, (9 more...)

Neural Information Processing Systems

Genre: Research Report (0.39)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.39)

Add feedback

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Neural Information Processing SystemsOct-10-2024, 00:45:24 GMT

Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for \textit{unsupervised object-centric scene representation} are incapable of aggregating information from multiple observations of a scene. As a result, these single-view'' methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose \textit{The Multi-View and Multi-Object Network (MulMON)}---a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views.

learning object-centric representation, multi-object scene, representation, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.42)

Add feedback

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Gao, Gege, Liu, Weiyang, Chen, Anpei, Geiger, Andreas, Schölkopf, Bernhard

arXiv.org Artificial IntelligenceNov-30-2023

As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized text embeddings are inherently unable to capture a complex description with multiple entities and relationships. Holistic 3D modeling of the entire scene further prevents accurate grounding of text entities and concepts. To address this limitation, we propose GraphDreamer, a novel framework to generate compositional 3D scenes from scene graphs, where objects are represented as nodes and their interactions as edges. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model and is able to fully disentangle different objects without image-level supervision. To facilitate modeling of object-wise relationships, we use signed distance fields as representation and impose a constraint to avoid inter-penetration of objects. To avoid manual scene graph creation, we design a text prompt for ChatGPT to generate scene graphs based on text inputs. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer in generating high-fidelity compositional 3D scenes with disentangled object entities.

diffusion model, graphdreamer, scene graph, (16 more...)

arXiv.org Artificial Intelligence

2312.00093

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Filters

Collaborating Authors

multi-object scene

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

GAMA: GenerativeAdversarialMulti-Object SceneAttacks

3d9dabe52805a1ea21864b09f3397593-Paper.pdf

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

KGN-Pro: Keypoint-Based Grasp Prediction through Probabilistic 2D-3D Correspondence Learning

Review for NeurIPS paper: Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs